n-Best Reranking for the Efficient Integration of Word Sense Disambiguation and Statistical Machine Translation

نویسندگان

  • Lucia Specia
  • Baskaran Sankaran
  • Maria das Graças Volpe Nunes
چکیده

Although it has been always thought that Word Sense Disambiguation (WSD) can be useful for Machine Translation, only recently efforts have been made towards integrating both tasks to prove that this assumption is valid, particularly for Statistical Machine Translation (SMT). While different approaches have been proposed and results started to converge in a positive way, it is not clear yet how these applications should be integrated to allow the strengths of both to be exploited. This paper aims to contribute to the recent investigation on the usefulness of WSD for SMT by using n-best reranking to efficiently integrate WSD with SMT. This allows using rich contextual WSD features, which is otherwise not done in current SMT systems. Experiments with English-Portuguese translation in a syntactically motivated phrase-based SMT system and both symbolic and probabilistic WSD models showed significant improvements in BLEU scores.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Combining Morphosyntactic Enriched Representation with n-best Reranking in Statistical Translation

The purpose of this work is to explore the integration of morphosyntactic information into the translation model itself, by enriching words with their morphosyntactic categories. We investigate word disambiguation using morphosyntactic categories, n-best hypotheses reranking, and the combination of both methods with word or morphosyntactic n-gram language model reranking. Experiments are carrie...

متن کامل

WSD for n-best reranking and local language modeling in SMT

We integrate semantic information at two stages of the translation process of a state-ofthe-art SMT system. A Word Sense Disambiguation (WSD) classifier produces a probability distribution over the translation candidates of source words which is exploited in two ways. First, the probabilities serve to rerank a list of n-best translations produced by the system. Second, the WSD predictions are u...

متن کامل

Bootstrapping Phrase-based Statistical Machine Translation via WSD Integration

Beside the word order problem, word choice is another major obstacle for machine translation. Though phrase-based statistical machine translation (SMT) has an advantage of word choice based on local context, exploiting larger context is an interesting research topic. Recently, there have been a number of studies on integrating word sense disambiguation (WSD) into phrase-based SMT. The WSD score...

متن کامل

How Phrase Sense Disambiguation outperforms Word Sense Disambiguation for Statistical Machine Translation

We present comparative empirical evidence arguing that a generalized phrase sense disambiguation approach better improves statistical machine translation than ordinary word sense disambiguation, along with a data analysis suggesting the reasons for this. Standalone word sense disambiguation, as exemplified by the Senseval series of evaluations, typically defines the target of disambiguation as ...

متن کامل

Word Sense Disambiguation vs. Statistical Machine Translation

We directly investigate a subject of much recent debate: do word sense disambigation models help statistical machine translation quality? We present empirical results casting doubt on this common, but unproved, assumption. Using a state-ofthe-art Chinese word sense disambiguation model to choose translation candidates for a typical IBM statistical MT system, we find that word sense disambiguati...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008